The Scamseek Project - Text Mining for Financial Scams on the Internet
نویسنده
چکیده
The Scamseek project, as commissioned by ASIC has the principal objective of building an industrially viable system that retrieves potential scam candidate documents from the Internet and classifies them as to their potential risk of containing an illegal investment proposal or advice. The project produced multiple classifiers for different types of data, and achieved higher than expected performance statistics on classifications. The development of the system required the solution of two major problems in document classification, namely accurate identification of classes with very small footprints, <.1%, and classification using meaning intention rather than word strings. The approach taken used Systemic Functional Grammar to model the semantics of the scam classes and used unigrams with significant language preprocessing to assist in separating irrelevant documents. Litigations have been initiated by ASIC from classifications made by the system. ASIC operates the system on a 24/7 basis. The estimate of savings in human effort in its monitoring role is the order of 100-fold. The estimate in savings to the community cannot be estimated readily but is likely to be of the order of tens of millions of dollars.
منابع مشابه
Scamseek - A Language Technology Project Fulfilling Research Objectives with Industrial Obligations
The Scamseek project, as commissioned by the Australian Securities & Investment Commission (ASIC), had the principal objective of building an industrially viable system that retrieves scam candidate texts from the Internet and classifies them as to their potential risk of containing an illegal investment proposal or advice. The value of the system is the gain of significant time and efficiency ...
متن کاملTopic Modeling and Classification of Cyberspace Papers Using Text Mining
The global cyberspace networks provide individuals with platforms to can interact, exchange ideas, share information, provide social support, conduct business, create artistic media, play games, engage in political discussions, and many more. The term cyberspace has become a conventional means to describe anything associated with the Internet and the diverse Internet culture. In fact, cyberspac...
متن کاملDesigning a System for Trend Analysis of Users in Website Surfing in Iran Using Data Mining and Text Mining Algorithms
Background and Aim: As of the entrance of web surfing to the lifestyle of a vast majority of people in the society and the need for a more accurate social and cultural policy making in the field, authors intended to analyze the behavior of the society users in viewing different websites so as to help politicians and practitioners. Methods: Design science research method is used in this research...
متن کاملMining Interesting Aspects of a Product using Aspect-based Opinion Mining from Product Reviews (RESEARCH NOTE)
As the internet and its applications are growing, E-commerce has become one of its rapid applications. Customers of E-commerce were provided with the opportunity to express their opinion about the product on the web as a text in the form of reviews. In the previous studies, mere founding sentiment from reviews was not helpful to get the exact opinion of the review. In this paper, we have used A...
متن کاملA Solution for Preventing Fraudulent Financial Reporting using Descriptive Data Mining Techniques
In the present age of scams, financial statement fraud represents enormous cost to our economy. The deliberate misstatement of numbers in the accounting books with the help of well planned scheme by an intelligent squad of knowledgeable perpetrators in order to deceive the capital market participants is termed as financial statement fraud. In order to reduce fraud risk which comprehends both de...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006